Hi Daniel, Daniel Veillard wrote: > On Fri, Jul 27, 2007 at 11:06:36AM +0200, Stefan Behnel wrote: >> one of the lxml users noticed that libxml2 changes behaviour when you set the >> NONET option for xmlCtxtReadFile() and then call it twice on a network URL. >> The first time, it parses the external document. The second time, it refuses >> to parse it. >> >> The problem lies in the handling of the parser options, which are only set >> *after* the first call to xmlLoadExternalEntity(), in the following call to >> xmlDoRead(). I think this is ok in general as it allows users to parse from a >> URL by passing it in but to avoid additional network access when loading >> external entities transitively (DTDs etc.) - is this the intended semantics >> of >> the NONET option? > > Hum, no. The NONEt semantic is that any access outside the local filesystem > should genrate an error. Note that if you have a catalog remapping external > resources to local ones, then they should proceed without failure.
Sounds like a bug then. But I actually find that behaviour useful. You can check yourself if the URL you want to parse is a network URL, but you can't easily check if external entities in the respective document come from the network. So the current behaviour allows you to be more selective in what you want to restrict. >> Depending on how contexts are reused in an application, this can lead to >> unpredictable behaviour. In lxml, we can work around this by resetting the >> context options after parsing, but I would like to see the intended semantics >> of the NONET options cleared up and see reliable behaviour here. > > In general you should always reset the parsing context, like xmlCtxtRead* > function do. Right, they already do that. So the problem is not resetting the context, the problem is a difference in behaviour if the options were already set on the context or not. It just leaks state. Stefan _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
