Hi, one of the lxml users noticed that libxml2 changes behaviour when you set the NONET option for xmlCtxtReadFile() and then call it twice on a network URL. The first time, it parses the external document. The second time, it refuses to parse it.
The problem lies in the handling of the parser options, which are only set *after* the first call to xmlLoadExternalEntity(), in the following call to xmlDoRead(). I think this is ok in general as it allows users to parse from a URL by passing it in but to avoid additional network access when loading external entities transitively (DTDs etc.) - is this the intended semantics of the NONET option? Now, the thing is, when you reuse the parser context, then the options *stay* in the context when you use it the second time, so they will be picked up by the xmlLoadExternalEntity() call when running xmlCtxtReadFile() a second time. Depending on how contexts are reused in an application, this can lead to unpredictable behaviour. In lxml, we can work around this by resetting the context options after parsing, but I would like to see the intended semantics of the NONET options cleared up and see reliable behaviour here. Stefan _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
