On Wed, Apr 25, 2007 at 11:19:36AM +0100, Richard W.M. Jones wrote: > Daniel Veillard wrote: > >>The current uri->query field is always unescaped during parsing. I have > >>changed so it always stored in its raw form. This because otherwise > >>it's impossible to parse query strings such as: > >>file:///tmp/test.html?test=%26&second=%26 which can be generated by web > >>browsers. If anyone was relying on the current semantics, then it seems > >>to me that they cannot parse such query strings correctly. > > > > Aside from the number of new APIs, available there, that's my main > >issue with the patch. You are changing the default behaviour of a > >functionality exposed like forever. > > I guess I would really prefer an approach which hooked into the > >URI parsing itself and filled in an extra list of values (or rather > >an array of xmlChar *, alternatively name and values) in the xmlURI > >structure at the end. That would allow to keep the uri->query data > >as it was, and still provide the functionalities you suggest, based > >on a preparsed xmlURIPtr. This would also avoid adding an extra list > >type. > > OK, so I'll rework to integrate this into the normal parsing and saving > of URIs and put the results in the URI structure. (Is that right?)
yes. > uri->query really must be deprecated though! there is basically no deprecation in libxml2 possible, and I have no plan so far for libxml3, so ... > >I'm not sure about the ignore flag in that list, what it is > >used for ? > > So this is really useful in the situation that I'm actually using this > for: I want to parse the URI, remove some of the parameters (basically > the ones which my code understands) and leave the rest of the parameters > in place for another piece of code down the line to use. > > Now, removing a parameter from a linked list is annoyingly complicated, > but setting a flag to say "ignore this parameter - I've seen it" is a > lot easier. On the other hand, if the complexity is hidden inside a > uri.c function then that doesn't matter (so long as it works :-) okay, > > - those simplified API would work immediately with the Python generator > > which would not find char ** which can't be handled automatically. > > So the use of char ** to return values is there because I couldn't see a > good way to return an error indication. > > As an example, if xmlURIQueryGetSingle were defined as: > > char *xmlURIQueryGetSingle (xmlURIPtr uri, const char *name); > > then returning NULL might mean either (1) there is no field with that > name, or (2) there was an error, eg. in memory allocation. No memory allocation error should occur at that stage, you will return a const char * coming back from within the xmlURIPtr array. The life time of that value will be the same as the xmlURI, which is IMHO a fine way to do things. > About the use of char vs xmlChar: I could really see which one was > correct. I understand that xmlChar is unsigned because of some bogosity > in the XML spec, so xmlChar is used for characters in XML documents. > URIs are different though, so which should I be using? xmlChar is unsigned because of the bogosity of C strings which have no associated encoding (and no the current locale is not a decent answer for XML processing). xmlChar * means an UTF-8 encoded string. char * means "we don't know the encoding" basically. See the URI vs. IRI disaster (sorry I don't have the IRI RFC number offhand), I assume we should stick to char * (or rather const char *) for all of those APIs. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
